To start our preliminary analysis of the data, we plot how common each of these glitches are compared to eachother in each interferometer:
First of all, we can see that several glitch classes are specific to
each interferometer (although some may just have few enough examples
that they don’t show up in this graph). Another interesting thing to see
is that Hanford generally has more glitches than Livingston, and that
Blips, Koi Fish, and Low-Frequency Bursts appear to be the glitches with
the most examples. To see whether this is accurate, we will now
calculate summary statistics for each glitch class. The following
summary statistics include the number of glitches of that class detected
in each interferometer (columns n_H1 and n_L1)
and the means of each of the predictor variables for each class.
## # A tibble: 22 × 8
## label n_H1 n_L1 snr peak_freq central_freq duration bandwidth
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1080Lines 327 1 10.2 1111 2961 0.85 4730
## 2 1400Ripples 0 81 10.9 1527 1846 0.15 1654
## 3 Air_Compressor 55 3 8.7 48 320 0.41 567
## 4 Blip 1453 368 22.8 199 839 0.27 1595
## 5 Chirp 28 32 13.6 141 264 0.29 461
## 6 Extremely_Loud 266 181 2416. 140 2673 8.17 5311
## 7 Helix 3 276 8.8 134 263 0.09 326
## 8 Koi_Fish 517 189 139. 157 1834 1.75 3629
## 9 Light_Modulation 511 1 34.6 105 2000 2.34 3966
## 10 Low_Frequency_B… 166 455 29.9 16 2611 2.91 5208
## 11 Low_Frequency_L… 79 368 23.1 12 2630 3.94 5243
## 12 No_Glitch 91 59 9.3 183 1601 1.95 2915
## 13 None_of_the_Abo… 51 30 45.3 170 1744 2.72 3436
## 14 Paired_Doves 27 0 33.4 41 1270 0.42 2505
## 15 Power_Line 273 176 11.3 62 733 0.75 1367
## 16 Repeating_Blips 230 33 29.2 200 1650 0.31 3214
## 17 Scattered_Light 385 58 16.4 30 2175 2.61 4319
## 18 Scratchy 90 247 8.6 153 1223 1.45 2269
## 19 Tomte 61 42 16.2 47 833 0.73 1622
## 20 Violin_Mode 141 271 13.4 1673 1742 0.29 2637
## 21 Wandering_Line 42 0 27.8 667 2127 6.05 3929
## 22 Whistle 2 297 9.5 1093 2690 0.59 4788
From this summary data, we can see that the Machine Learning system has classed several glitches into categories from the “wrong” observeratory. This can be explained in the following way: enough glitches happen that even if a certain glitch type isn’t present in one interferometer, a burst of noise can appear with a random shape that the ML couldn’t classify well into the “correct” interferometer’s categories, and which happens to look like a glitch from another category, and would get classified into that; alternatively, since “None of the Above” is a category, this implies that the training data (classified by citizen scientists) is included in this dataset, and these mistakes were human errors, already present in the training data. We can also see that Koi Fish, one of the most prominent types of glitches, is also the loudest standard class (other than Extremely Loud glitches, which are the loudest by definition), while Scratchy, Helix, and Air Compressor glitches are the quietest, even quieter on average than the ‘No glitch’ category.
To begin our analysis of the data points themselves (other than just
the averages for each glitch class) we plot out bandwidth
by duration, with the colour of the points representing the
label, to see how well we can group glitch classes by the general
dimensions of the signal:
This image is a bit hard to read, since the labels take up so much room, and since there are so many data points on the graph; still, one can tell that in the lower part of the distribution the glitch population is dominated by the blue and purple colours of Koi Fish, Low-Frequency Bursts, and Low-Frequency Lines, with a sudden stripe of green (and assorted other colours) at the very bottom. Zooming in on the y-axis to glitches with durations less than 5 seconds gives us the following plot:
With this zoomed-in visualisation, we can see that the data has some artefacts, causing the values to line up on a grid; ignoring this, however, we notice the large cluster of green “Blip”-type glitches at the bottom, especially prominent in the bottom-left corner, as well as two major populations of 1080-Lines: one forming linear patterns in the lower-right-hand corner of the distribution, and the other one being around the left-hand side of the blip distribution. We also see a population of either violin modes or wandering lines in the center of the lower edge of the plot, its density peaking at bandwidths between 3000 and 4000. We also now see that there are many Koi Fish glitches spread through the background distribution, which we mixed in with the colour of the low-frequency bursts and lines in our earlier analysis. The last notable point that stands out in this graph is the fact that Koi fish and Low Frequency Bursts seem to dominate for most of the chart, with assorted scattered-light glitches among them as well.
Meanwhile, if we instead plot peak frequency by duration (using the same limit on duration), we get the following graph:
Here, we can clearly see spikes of 1080-Hz lines and 1400-Hz ripples at their respective frequencies, as well as several spikes of violin modes at frequencies just over 1000Hz, 1500Hz, and 2000Hz, among a background composed mostly of Whistles. Moving into lower frequencies, we see a cloud of Blips underneath another blob, mostly composed of Koi Fish. There are several distributions of other glitches at lower frequencies as well, but these are harder to see clearly because of how little room they take up on the graph; to solve this, we use the same transformation as the Gravity Spy spectrograms do: taking the logarithm of the frequency values.
In this new graph, while we can still see the high-frequency spikes, we can now also see many lower-frequency trends (as well as similar gridline textures as in the previous diagrams). One of the most striking, in my opinion, is the line at 20Hz that seperates the low frequency lines and bursts from the scattered light glitches. There is a similar line on the other side of the scattered light glitches which seperates them from most other glitches (although there is a small area outside this line where there are scattered lights mixed with other glitch types, surprisingly enough still bounded by vertical lines). I have outlined these areas in the following plot:
Moving back to the unmarked graph, we can see the 60Hz Power Line glitches as a line of orange-coloured points around the 60Hz-line, and the Air Compressor glitches as a similar, yellow line at around 45Hz. We also see that in this graph, blips are mostly found in a triangle from 40Hz to 700Hz, and with durations less than 1 second, with a cluster of Helixes in the center. The area above this triangle has a background patterned with the dark blues of Koi Fish and Light Modulation, the cyan colour of Scratchy glitches, and the pale blue of Tomtes. Restricting our graph to these types of glitches to get a better look at their distributions, we get the following graph:
Here, we can tell that Light Modulation has data points all across the graph, from around (1000, 0) to around (11, 5), while Tomtes are fairly localised between (32, 0) and (64, 1.3), with few exceptions. Helices are indeed clustered in the center of the Blip cluster, which is in most cases visibly seperate from the Koi Fish cluster, with Scratchy glitches being found throughout both of these.
Next, we plot out SNR by each of duration
and peak frequency, and analyse the results of these
graphs:
From the first plot, we can see that, while blips, koi fish, and
Extremely Loud glitches form visually distinct categories, most of the
other glitch classes are in the same region as eachother. The second
plot (which is basically just a zoomed-out version of the first plot)
emphasises these four distributions, while also showing a notable second
group of Extremely Loud glitches that intersect the ‘other’
distribution. The third plot, however, is arguably the easiest graph to
tell the different glitch classes apart on so far, with scratchy having
a nearly-distinct region (although slightly overlapping with Helix and
Blips), as well as Scattered Light (which has the most overlap with
Tomtes, surprisingly) and most of the clusters from the
peak frequency-duration plot, although seeming
to mix up the other low-frequency glitches more than the original graph.
To see whether we can combine all three of these variables into a single
plot, we create a 3d visualisation of the data:
This interactive 3d chart is especially useful here, since we can
isolate combinations of glitches and see the differences between them.
For example, Blips and Koi fish can be seen to not only have a
nearly-quadratic boundary in the
peak_frequency-snr plane, but also that Koi
Fish generally have longer durations than blips, sometimes dramatically
so.
We also create charts of these three variables plotted against amplitude.
Here, we won’t note specific glitch distributions, but one may notice
that the amplitude-to-SNR graph has a linear lower boundary, as well as
several diagonal lines running parallel to the boundary. The fact that
there is a relation here is unsurprising, as SNR calculation includes
the signal’s strength; however, the exact relation here is unclear. In
addition to this, the amplitude-to-peak-frequency graph has a lower
boundary with a shape familiar to anyone analysing gravitational-wave
data: the ASD-frquency graph of a detector (specifically, the LIGO O1
detectors): (Image from the Gravitational Wave Open
Science Center: https://gwosc.org/o1speclines/)
That the glitches all occur above this line is unsurprising, as the graph measures the minimum signal necessary at each frequency to detect a signal; glitches therefore must have amplitudes higher than this curve to be directly detected.
Next, we are going to manually create a ‘model’ to attempt to predict
the glitch type based on these variables, as well as ifo.
Unlike an actual model, we will not be predicting all values
simultaneously, but will have a more complex elimination approach to the
data, which we will create manually.
To begin, we create a new dataframe, where we remove the glitches
with multiple bliplets with similarities to other glitch classes
(repeating blips and light modulation). We
then add a new variable, predicted_label, for the predicted
glitch classes, and classify all glitches with SNRs higher than 500 as
Extremely Loud, since this will include all of the
high-peak-frequency population of ELs, as well as much of the
lower-frequency population, while also excluding as many other glitches
as possible.
Next, we remove variables not being used in our analysis of the data.
We then create a multinomial model, fit it to the data of the training set, and use the resulting fit to predict the testing dataset. This generates a new list of predictions as to the glitch classes.
We then take these predictions and graph them out, in comparison to the initial graph and with the fit data itself.
## # A tibble: 18 × 7
## y.level `(Intercept)` snr amplitude ifoL1 peak_frequency_log duration
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1080Lines -1.65 0.0204 -2.57e-17 -15.7 0.869 0.994
## 2 Blip 53.4 0.0505 1.51e-16 -12.0 -4.75 -5.07
## 3 Violin_Mo… -35.4 0.0676 4.05e-16 -6.65 4.04 -1.35
## 4 Scattered… 73.2 -0.0121 -2.66e-15 -12.5 -8.82 1.25
## 5 Power_Line 64.0 -0.0195 -1.38e-15 -11.0 -6.71 -0.596
## 6 Koi_Fish 53.7 0.0635 -1.42e-15 -11.1 -5.63 0.847
## 7 Scratchy 45.1 -0.123 -4.19e-16 -10.3 -3.98 1.20
## 8 Tomte 76.1 0.0632 -7.36e-16 -10.6 -8.97 -3.19
## 9 Chirp 50.5 0.0427 1.23e-16 -10.7 -4.87 -4.29
## 10 Wandering… 17.1 0.0262 1.32e-16 -19.7 -1.30 1.35
## 11 Extremely… 60.1 0.0708 -3.21e-15 -9.78 -7.40 1.32
## 12 No_Glitch 61.3 -0.0256 -5.31e-16 -10.8 -6.71 1.22
## 13 Low_Frequ… 91.9 0.0551 -9.29e-15 -11.1 -13.7 1.12
## 14 Paired_Do… 78.5 0.0700 1.54e-16 -15.6 -9.69 -2.65
## 15 Low_Frequ… 92.0 0.0683 1.83e-14 -11.3 -13.4 0.800
## 16 Helix 62.8 0.0733 3.07e-16 -7.04 -6.60 -14.5
## 17 Air_Compr… 80.3 0.0719 2.49e-17 -17.3 -9.49 -6.44
## 18 1400Rippl… -30.2 0.0692 7.25e-17 -5.72 3.36 -2.45
We note that the model refuses to predict 1400 ripples accurately (the
only predicted 1400 ripple isn’t even near 1400Hz) and
overpredicting violin modes, as well as not predicting the two glitch
types that we filtered out (repeating blips and light modulation), as
well as messing with the lower-left area of the graph. However, we note
that, visually, the model does appear to predict the ranges of
scattered light, blips, koi fish, and helices fairly accurately. We now
attempt to test the model’s predictions. To do this check the generic
accuracy of each predicted class and create a confusion matrix for the
data.
label | 1080Lines | Air_Compressor | Blip | Extremely_Loud | Helix | Koi_Fish | Low_Frequency_Burst | Low_Frequency_Lines | No_Glitch | Power_Line | Scattered_Light | Scratchy | Tomte | Violin_Mode | Wandering_Line | Whistle |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1080Lines | 92 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 1 | 0 |
1400Ripples | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 22 | 0 | 0 |
Air_Compressor | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 0 | 0 | 0 | 0 | 0 | 0 |
Blip | 1 | 2 | 509 | 0 | 18 | 3 | 0 | 0 | 0 | 2 | 0 | 6 | 0 | 0 | 0 | 6 |
Chirp | 0 | 0 | 15 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
Extremely_Loud | 0 | 0 | 0 | 118 | 0 | 4 | 6 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
Helix | 0 | 0 | 13 | 0 | 70 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
Koi_Fish | 3 | 0 | 13 | 10 | 0 | 177 | 3 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 |
Low_Frequency_Burst | 1 | 1 | 0 | 2 | 0 | 0 | 161 | 13 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 |
Low_Frequency_Lines | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 45 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 |
No_Glitch | 1 | 0 | 4 | 0 | 0 | 0 | 7 | 1 | 1 | 12 | 11 | 4 | 0 | 2 | 0 | 0 |
Paired_Doves | 0 | 5 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Power_Line | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 133 | 0 | 0 | 0 | 0 | 0 | 0 |
Scattered_Light | 0 | 0 | 0 | 3 | 0 | 2 | 1 | 0 | 5 | 1 | 123 | 0 | 0 | 0 | 0 | 0 |
Scratchy | 0 | 0 | 26 | 0 | 10 | 0 | 0 | 0 | 2 | 2 | 0 | 60 | 0 | 0 | 0 | 0 |
Tomte | 0 | 0 | 3 | 0 | 2 | 0 | 0 | 0 | 0 | 17 | 3 | 0 | 5 | 0 | 0 | 0 |
Violin_Mode | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 109 | 0 | 19 |
Wandering_Line | 7 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 0 |
Whistle | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 25 | 0 | 58 |
## # A tibble: 19 × 2
## # Groups: label [19]
## label correct
## <chr> <dbl>
## 1 1080Lines 0.929
## 2 1400Ripples 0
## 3 Air_Compressor 0.0588
## 4 Blip 0.931
## 5 Chirp 0
## 6 Extremely_Loud 0.901
## 7 Helix 0.833
## 8 Koi_Fish 0.851
## 9 Low_Frequency_Burst 0.875
## 10 Low_Frequency_Lines 0.341
## 11 No_Glitch 0.0233
## 12 Paired_Doves 0
## 13 Power_Line 0.985
## 14 Scattered_Light 0.911
## 15 Scratchy 0.6
## 16 Tomte 0.167
## 17 Violin_Mode 0.852
## 18 Wandering_Line 0.154
## 19 Whistle 0.637
We can see that, while this model predicts several glitch classes accurately, it fails at distinguishing between the high-frequency broadband glitches: 1080 line, 1400 ripple, wandering line, whistle, and violin mode harmonics (which were mostly predicted accurately, but at the cost of predicting many whistles and wandering lines as VMHs as well). It also failed to distinguish the low-mid-frequency glitches: air compressor, lower-frequency blips, tomte, power lines, and scratchy. However, using the naked eye to view a spectrogram, one can easily distinguish between most of these classes, excepting edge cases between blips and tomtes or between air compressors and power lines. Most of our manual observations have been predicted, but a few notable exceptions exist: Violin modes come in multiple different frequency bands, and the regression model tried to predict all of them in a single boundary, since they were given as one category. With some manual tweaking, one could easily split this class into multiple predicted classes, which, along with possibly allowing for better predictions of 1400 ripples, would greatly improve the predictions of high-frequency glitches. It also does particularly badly at predicting tomtes, low-frequency lines, and air compressor glitches. Some initial selection could possibly fix the ‘air compressor/power line’ problem, but wouldn’t necessarily help in classifying tomtes. The LFB-LFL distinction is nearly impossible to solve without data on the evolution of the waveform of the glitch over time.
We also note that there is also a ‘validation’ frame in the data, which we have not used yet. Since we are not performing cross-validation here, I will merge this with the testing data, to provide a larger testing population.
From here, we apply these changes to the initial dataframe, and then reuse the code from the initial model.
label | 1080Lines | Blip | Extremely_Loud | Helix | Koi_Fish | Low_Frequency_Burst | Low_Frequency_Lines | No_Glitch | Power_Line | Scattered_Light | Scratchy | Tomte | VMH_1500 | VMH_2000 | Wandering_Line | Whistle | 1400Ripples |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1080Lines | 93 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 3 | 2 | 0 | 0 |
1400Ripples | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 22 | 0 | 0 | 0 | 0 |
Air_Compressor | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Blip | 0 | 508 | 0 | 17 | 4 | 0 | 0 | 0 | 7 | 0 | 8 | 0 | 0 | 1 | 0 | 2 | 0 |
Chirp | 0 | 14 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Extremely_Loud | 0 | 0 | 119 | 0 | 5 | 3 | 0 | 1 | 0 | 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
Helix | 0 | 5 | 0 | 78 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Koi_Fish | 1 | 19 | 6 | 0 | 170 | 1 | 0 | 0 | 0 | 6 | 0 | 1 | 0 | 0 | 3 | 0 | 1 |
Low_Frequency_Burst | 1 | 0 | 2 | 0 | 0 | 141 | 30 | 0 | 0 | 8 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
Low_Frequency_Lines | 0 | 0 | 0 | 0 | 0 | 56 | 70 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
No_Glitch | 1 | 4 | 0 | 0 | 0 | 6 | 3 | 1 | 11 | 12 | 3 | 0 | 0 | 2 | 0 | 0 | 0 |
Paired_Doves | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 |
Power_Line | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 3 | 126 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
Scattered_Light | 0 | 0 | 2 | 0 | 2 | 1 | 2 | 6 | 1 | 121 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Scratchy | 0 | 21 | 0 | 13 | 0 | 0 | 0 | 3 | 1 | 0 | 62 | 0 | 0 | 0 | 0 | 0 | 0 |
Tomte | 0 | 2 | 0 | 4 | 0 | 0 | 0 | 0 | 11 | 8 | 0 | 5 | 0 | 0 | 0 | 0 | 0 |
VMH_1000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 12 | 0 |
VMH_1500 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 58 | 0 | 0 | 7 | 0 |
VMH_2000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 51 | 0 | 0 | 0 |
Wandering_Line | 6 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 2 | 0 | 0 |
Whistle | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 25 | 0 | 0 | 58 | 0 |
## # A tibble: 21 × 2
## # Groups: label [21]
## label correct
## <chr> <dbl>
## 1 1080Lines 0.939
## 2 1400Ripples 0
## 3 Air_Compressor 0
## 4 Blip 0.929
## 5 Chirp 0
## 6 Extremely_Loud 0.908
## 7 Helix 0.929
## 8 Koi_Fish 0.817
## 9 Low_Frequency_Burst 0.766
## 10 Low_Frequency_Lines 0.530
## # ℹ 11 more rows
Ironically, these changes led to the complete non-prediction of the ‘air compressor’ and ‘1400 ripple’ classes (although that is probably better than consistently predicting other-class glitches as them, and missing the original glitches). Testing other combinations of the changes we implemented, they also tend to make the rest of the data less accurate as well, meaning that this may be our best bet at predicting the glitch classes. We can see that the overall accuracy of the model has actually decreased. While some glitch classes (1080 Lines, Helixes, LFBs and LFLs, Scattered Light, and Scratchy) have been predicted better, many of the other common classes have been predicted worse: this has actually reduced the accuracy of the high-frequency-glitch predictions (other than 1080 lines), and has also reduced the probabilities of accurately predicting Air Compressors, Blips, Extremely Loud glitches, Koi Fish, Power Lines, and Tomtes. Gnerally, the lower-frequency, lower-energy glitches are the ones that the second model predicts better, while the rest are predicted better by the first model. We now calculate the general accuracy of each model, and graph them to compare the distributions of glitches from one last angle.
## # A tibble: 2 × 2
## correct n
## <chr> <int>
## 1 correct 1664
## 2 incorrect 462
## # A tibble: 2 × 2
## correct n
## <chr> <int>
## 1 correct 1663
## 2 incorrect 463
Generally, these two models are equally good, as they both predict between 3/4 and 4/5 of the glitches correctly, just with errors in different areas of the glitch hyperspace, so it would probably be better to choose the more simple of the models for the sake of the model not being too complex. We now count the overall